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I. Introduction 

We consider the problem of recovering the sparse signal vector x G C n ° with support set X 
(containing the locations of the non-zero entries of x) from m linear measurements J2j 

z = Ax + Be. (1) 

Here, A G C mxn " and B G C mXTlb are given and known dictionaries, i.e., matrices that are 
possibly over-complete and whose columns have unit Euclidean norm. The vector e G C nfc with 
support set £ represents the sparse interference. We investigate the following models for the 
sparse signal vector x and sparse interference vector e, and their support sets X and £: 

• The interference support set £ is arbitrary, i.e., £ C {1, ...,n&} can be any subset of 
cardinality n e . In particular, £ may depend upon the sparse signal vector x and/or the 
dictionary A, and hence, may also be chosen adversarially. The support set X of x is 
chosen uniformly at random, i.e., X is chosen uniformly at random from all subsets of 
{1, . . . , n a } with cardinality n x . 

• The support set £ of the sparse interference vector e is chosen uniformly at random, i.e., 
£ is chosen uniformly at random from all subsets of {1, .. . ,rib} with cardinality n e . The 
support set X is assumed to be arbitrary and of size n x . 

• Both X and £, the support sets of the signal and of the interference with size n x and n e , 
respectively, are chosen uniformly at random. 

In addition, for each model on the support sets X and £ we may or may not know either of the 
support sets prior to recovery. 

As discussed in [2], recovery of the sparse signal vector x from the sparsely corrupted 
observation z in ([T]) is relevant in a large number of practical applications. In particular, restoration 
of saturated or clipped signals [|3j-[|5J, signals impaired by impulse noise [[6)-[[8j, or removal of 
narrowband interference is captured by the input-output relation ([T|). Furthermore, the model ([TJ) 
enables us to investigate sparsity-based super-resolution and in-painting Q, [10|, as well as 



signal separation pT| , p2| . Hence, identifying the fundamental limits on the recovery of the 
vector x from the sparsely corrupted observation z is of significant practical interest. 

Recovery guarantees for sparsely corrupted signals have been partially studied in [|2j, [[3j, 
p3|-p0|. In particular, 0, [ 1 3 1 investigated coherence-based recovery guarantees for arbitrary 



support sets X and £ and for varying levels of support-set knowledge; [14] analyzed the special 
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case where both support sets are unknown, but one is chosen arbitrarily and the other at random. 



The recovery guarantees in [ 15 1-| 17 1 require that the measurement matrix A is chosen at random 



and that B is unitary. The guarantees in (3j, [18|-[20| characterize A by the restricted isometry 



property (RIP), which is, in general, difficult to verify in practice. The recovery guarantees 



16), p8| require B to be unitary, whereas [19|, [20| only consider a single dictionary A and 



partial support-set knowledge within A. The case of support-set knowledge was also addressed 
in pT| , but for a model that differs considerably from the setting here. Specifically, pT| uses 
a time-evolution model that incorporates support-set knowledge obtained in previous iterations 
and the corresponding results are based on the RIP. Finally, [ |22j considered a model where the 
interference is sparse in an unknown basis. The specific models and assumptions underlying 
the results in |3j, [15|-[22| reduce their utility for the applications outlined above. 



A. Generality of the signal and interference model 

In this paper, we will exclusively focus on probabilistic results where the randomness is in the 
signal and/or the interference but not in the dictionary. Furthermore, the dictionaries A and B 
will be characterized only by their coherence parameters and their dimensions. Such results 
enable us to operate with a given (and arbitrary) pair of sparsifying dictionaries A and B, rather 



than hoping that the signal will be sparse in a randomly generated dictionary (as in [17|) or 
that A satisfies the RIP. The following two application examples illustrate the generality of our 
results. 

1) Restoration of saturated signals: In this example, a signal y = Ax is subject to satu- 
ration J2|. This impairment is captured by setting z = g a (y) in ([!]), where g a (-) implements 
element-wise saturation to [—a, a] with a being the saturation level. By writing z = y + e with 
e = <7 a (y) — y, where e is non-zero only for the entries where the saturation in z occurs, we 
see that for moderate saturation levels a, the vector e will be sparse. The reconstruction of 
the (uncorrupted) signal y from the saturated measurement z, amounts to recovering x from 
z = Ax + e, followed by computing y = Ax. 

We assume that the signal y = Ax is drawn from a stochastic model where x has a support 
set chosen uniformly at random. Since the saturation artifacts modeled by e are dependent on y, 
we want to guarantee recovery for arbitrary 8. Furthermore, we can identify the locations where 
the saturation occurs (e.g., by comparing the entries of z to the saturation level a) and hence, 
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we can assume that 8 is known prior to recovery. The recovery guarantees developed in this 
paper include this particular combination of support-set knowledge and randomness as a special 
case, whereas the recovery guarantees in (2j, [ 14 1, [23 1 are unable to consider all aspects of this 



model and turn out to be more restrictive. 

2) Removal of impulse noise: Consider a signal y = Ax that is subject to impulse noise. 
Specifically, we observe z = y + e, where e is the impulse noise vector. For a sufficiently low 
impulse-noise rate, e will be sparse in the identity basis, i.e., B = I. As before, consider the 
setting where y = Ax is generated from a stochastic model with unknown support set X. Since 
impulse noise does not, in general, depend on the signal y, we may chose 8 at random. In 
addition, the locations 8 of the impulse noise are normally unknown. 

Recovery guarantees for this setting are partially covered by [|2j, [14|, [23|. However, as for 



the saturation example above, the recovery guarantees in [|2), [14|, (23j are unable to exploit 



all aspects of support-set knowledge and randomness. The results developed here cover this 
particular setting as a special case and hence, lead to less restrictive recovery guarantees. 

In fact, there is an even more general setting compared to ([T]), which encompasses the cases 
listed in Table [j] Specifically, a generalization would be to consider the model z = Ax + Be 
with X = supp(x) = X r U X a and 8 = supp(e) = 8 r U 8 a where the support set X is known 
and 8 is unknown, and, furthermore, X a and 8 a are chosen arbitrarily and X r and 8 r are chosen 
uniformly at random. The analysis of this model, however, is left for future workQ 

B. Contributions 

In this paper, we present probabilistic recovery guarantees that improve or refine the ones in p), 



1 14 1, [23 1 and cover novel cases for varying degrees of knowledge of the signal and interference 
support sets. Our results depend on the coherence parameters of the two dictionaries A and B, 
their dimensions, and their spectral norms. In particular, we present novel recovery guarantees 
for the situations where the support sets X and/or 8 are chosen at random, and for the cases 
where knowledge of neither, one, or both support sets X and 8 is available prior to recovery. 
For the case where one support set is random and the other arbitrary, but no knowledge of X 
and 8 is available, we present an improved (i.e., less restrictive) recovery guarantee than the 

'Note that our model corresponds to the case where two of the sets X r , X a , £ r , and £ a are forced to be the empty set. 
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TABLE I: Summary of all recovery guarantees for sparsely corrupted signals. 





X, £ arbitrary 


X random, £ arbitrary 


X arbitrary, £ random 


X, £ random 




Case la 
(2I Thm. 3] 








X, £ known 


Case lb 
Theorem u\ 


Case lb 
Theorem 1 


Case lc 
Theorem fll 




Case 2a 
I2I Thm. 4] 








£ known 


Case 2b 
Theorem 


Case 2c 
Theorem 4 


Case 2d 
Theorem 




Case 2a 
J2I Cor. 6] 








X known 


Case 2c 
Theorem^ 


Case 2b 
Theorem 2 


Case 2d 
Theorem 3 




Case 3a 
(14I Thms. 2 and 3] 








neither known 


Case 3b 
Theorem^ and 1 14.. Thm. 6] 


Case 3b 
Theorem 5 and 1 14 Thm. 6] 


Case 3c 
Theorem 6 



existing one in [14 Thm. 6]. Finally, we show that ^-norm minimization is able to recover the 



vectors x and e with overwhelming probability, even if the number of non-zero components in 
both scales (near) linearly with the number of measurements. 

A summary of all the cases studied in this paper is given in Table [IJ the theorems highlighted 
in dark gray indicate novel recovery guarantees, light gray indicates refined ones. We will only 
prove the boldface theorems; the corresponding symmetric cases are shown in italics and the 
associated recovery guarantees can be obtained by interchanging the roles of x and e. 

C. Notation 

Lowercase and uppercase boldface letters stand for column vectors and matrices, respectively. 
For the matrix M, we denote its transpose, adjoint, and (Moore-Penrose) pseudo-inverse by 
M T , M ff , and M^, respectively. The jth column and the entry in the ?th row and jth column 
of the matrix M is designated by nx, and [M], ; j, respectively. The minimum and maximum 
singular value of M are given by cr min (M) and cr max (M), respectively; the spectral norm is 
||M|| 22 = a max (M). The £i-norm of the vector v is denoted by \\v\\ x and ||v|| stands for 
the number of nonzero entries in v. Sets are designated by upper-case calligraphic letters; the 
cardinality of the set S is |«S|. The support set of v, i.e., the indices of the nonzero entries, is 
given by supp(v). The matrix M5 is obtained from M by retaining the columns of M with 
indices in S; the vector v$ is obtained analogously from the vector v. The sign(-) function 
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applied to a vector returns a vector consisting of the phases of each entry. The N x N restriction 
matrix R5 for the set S C {1, . . . , N} has [Rs]fc,fc = 1 if k 6 <S and is zero otherwise. For 
random variables X and Y, we define E 9 [X] = E[|X| 9 ] to be the gth moment, which defines 
an £ g -norm on the space of complex-valued random variables, and hence satisfies the triangle 
inequality. We define E, 9 x [f(X,Y)} to be the gth moment with respect to X and we define 
1 [// 7^ 0] to be equal to 1 if the condition /i^O holds and otherwise. For two functions / and 
g we write / ~ g to indicate that f(n)/g(n) — )■ 1 as n — ¥ 00, and we say that "/ scales with g>." 
Throughout the paper, X = supp(x) is assumed to be of cardinality n x and £ = supp(e) 
of cardinality n e . We define D = [ A B] and T)x.e — [A-x Bg ] to be the sub-dictionary of D 
associated with the non-zero entries of x and e. Similarly, we define the vector Sx,s = [ X J e s Y 
which consists of the non-zero components of s = [x T e T ] T . 

D. Outline of the paper 

The remainder of the paper is organized as follows. Related prior work is summarized in 
Section |n} The main theorems are presented in Section III and a corresponding discussion is 



given in Section IV We conclude in Section W\ All proofs are relegated to the Appendices. 



II. Related Prior Work 

We next summarize relevant prior work on sparse signal recovery and sparsely corrupted 
signals, and we put our results into perspective. 

A. Coherence-based recovery guarantees 

During the last decade, numerous deterministic and probabilistic guarantees for the recovery 



of sparse signals from linear (and non-adaptive) measurements have been developed [23|-[31|. 
These results give sufficient conditions for when one can reconstruct the sparse signal vector x 
from the (interference-less) observation y = Ax by solving 

(PO) minimize ||x|| subject to y = Ax, 

X 

or its convex relaxation, known as basis pursuit, defined as 

(BP) minimize Hxj^ subject to y = Ax. 

X 
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In particular, in [24|-[26| it is shown that if ||x|| < n x for some n x < (1 + l//-t a ) /2 with the 



coherence parameter 



/i a = max Ka^a^l, (2) 



then (PO) and (BP) are able to perfectly recover the sparse signal vector x. Such coherence-based 
recovery guarantees are, however, subject to the "square-root bottleneck", which only guarantees 



the recovery of x for sparsity levels on the order of n x ~ y/m |23 1. This behavior is an immediate 



consequence of the Welch bound [32 1 and dictates that the number of measurements must grow 
at least quadratically in the sparsity level of x to guarantee recovery. In order to overcome this 
square -root bottleneck, one must either resort to a RIP-based analysis, e.g., p7|-p0|, which 
typically requires randomness in the dictionary A, or a. probabilistic analysis that only considers 
randomness in the vector x, and where A is constant (known) and solely characterized by its 
coherence parameter, dimension, and spectral norm p3| . In this paper, we are interested in the 
latter type of results. Such probabilistic and coherence-based recovery guarantees that overcome 



the square-root bottleneck have been derived for (PO) and (BP) in [23|. The corresponding 
results, however, do not exploit the structure of the problem ([]]), i.e., the fact that we are dealing 
with two dictionaries and that knowledge of X and/or £ may be available prior to recovery. 

B. Recovery guarantees for sparsely corrupted signals 

Guarantees for the recovery of sparsely corrupted signals as modeled by ([T]) have been 
developed recently in [|2), p"3] l, p"4fl . The reference [2] considers deterministic (and coherence- 
based) results for several cases^] which arise in different applications: 1) X = supp(x) and 
£ = supp(e) are known prior to recovery, 2) only one of X and £ is known, and 3) neither X 
nor £ are known. For case 1), the non-zero entries of both the signal and interference vectors 
can be recovered by 

sx,e = Dj^z, (3) 

if the recovery guarantee in [2, Thm. 2] is satisfied. For case 2), recovery is performed by using 
modified versions of (PO) and (BP); the associated recovery guarantees can be found in J2j 

2 Note that no efficient recovery algorithm with corresponding guarantees is known for the case studied in J21, where only the 
cardinality of X or £ is known. Thus, we do not consider this case in the remainder of the paper. 
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Thm. 4 and Cor. 6]. For case 3), recovery guarantees for the standard (PO) or (BP) algorithms 
are given in [14, Thms. 2 and 3]. However, all these recovery guarantees suffer from the square- 
root bottleneck, as they guarantee recovery for all signal and all interference vectors satisfying 
the given sparsity constraints. A notable exception for case 3) was discussed in JT4] Thm. 6]. 
There, e is assumed to be random, but x is assumed to be arbitrary. This model overcomes the 
square -root bottleneck and is able to significantly improve upon the corresponding deterministic 



recovery guarantees in [14, Thms. 2 and 3]. 

Another strain of recovery guarantees for sparsely corrupted signals that are able to overcome 
the square-root bottleneck have been developed in [|3j, p"5|-[20|. The references [15|-[17| 



consider the case where A is random, whereas [(3j, [[T8)-[(20) consider matrices A that are 
characterized by the RIP, which is, in general, difficult to verify for a given (deterministic) A. 
Indeed, it has been recently shown that calculating the RIP for a given matrix is NP-hard p3| . 
Moreover, the recovery guarantees in |3j, [15|-[[T8| require that B is an orthogonal matrix 



and, hence, these results do not allow for arbitrary pairs of dictionaries A and B. In addition, 



1 16 1, [18 1 do not study the impact of support-set knowledge on the recovery guarantees. The 



results in [ |T9| , pOj only consider a single dictionary with partial support-set knowledge and, 
thus, are unable to exploit the fact that the signal and interference exhibit sparse representations 
in two different dictionaries. While all these assumptions are valid for applications based on 



compressive sensing (see, e.g., [34 1, [35 1), they are not suitable for the application scenarios 
outlined in Section [Q 

To overcome the square -root bottleneck for arbitrary pairs of dictionaries A and B, we next 



propose a generalization of the probabilistic models developed in [14|, [23] for the cases 1), 2), 
and 3) outlined above. In particular, we impose a random model on the signal and/or interference 
vectors rather than on the dictionaries, and we allow for varying degrees of knowledge of the 
support sets X and E. An overview of the coherence-based recovery guarantees developed next 
is given in Table [TJ 

III. Main Results 
The recovery guarantees developed next rely upon the models M. (PO) and M. (BP) summarized 



in Model 1 and Model 2, respectively. Model 2 differs subtly from the model in [ [T4| in that we 
do not require the uniform phase assumption in the vector with known support, a setting which 
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Model 1 M(P0) 



• Let x G C" a and e G C™ 6 have support set X and £ , respectively, of which at least one 
is chosen uniformly at random and where the non-zero entries of both x and e are drawn 
from a continuous distribution. 

• The observation z is given by z = Ax + Be. 



Model 2 M(EP) 



The conditions of M{P0) hold. 

If X or £ is unknown, then assume that the corresponding non-zero entries of the associated 
vector(s) are drawn from a continuous distribution, where the phases of the individual 
components are independent and uniformly distributed on [0, 2ir). 



was not considered in [14|. In addition to the Models 1 and 2, our results require the coherence 
parameters^] of the dictionaries A and B, i.e., the coherence /j» a of A in (|2]), the coherence ^ 
of B given by 

fi b = max |(bj,bj)| , 
and the mutual coherence \x m between A and B, defined as 

(j, m = max | (a*, fy) | . 
Our main results for the cases highlighted in Table [I] are detailed next. 

A. Cases lb and 1c: X and £ known 

We start with the case where both support sets X and £ are known prior to recovery. The 
following theorem guarantees recovery of x and e from z, using ([3]), with high probability. 

Theorem 1 (Cases lb and 1c): Let x and e be signals satisfying the conditions of .M(PO), 
assume that both X and £ are known, and choose (3 ^ \og(n x ). If X is chosen uniformly at 

3 Note that we could also characterize the dictionaries A and B with the cumulative coherence |25|. For the sake of simplicity 
of exposition, however, we stick to the coherence parameters fj, a , jj,b, and /x m only. 
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10 



random, £ is arbitrary, and if 

<5e~4 ^ || A|| 2 2 ||B|| 2 2 \ — + l2/i a ^/3n x + (n e - l)jx b 




a 



In 

+ l[fi a ^0}— -\\A\\l 2 + 3fi m ^/2Pn~ e , (4) 

holds withj 8 — 1, then we can recover x and e using §3j with probability at least 1 — e _/3 . 
If both X and £ are chosen at random and if 

i In In 

5e'* ^ 12V£0u a V^ + /W^) + %a ^ 0] — ||A||2 2 + % fe ^ 0] — ||B||^ 2 

n a n b 



+ minJ3/i m y2^+./^||A H B|| 22 ,3/i mV ^^+,/^||A^B|| 2 1 



(5) 

holds with 5 = 1 and /3 ^ max{log(n x ),log(n e )}, then we can recover x and e using ^ with 
probability at least 1 — e _/3 . 

Proof: See Appendix [Bj ■ 

A discussion of the recovery conditions ^ and (J5J) is relegated to Section IV| 5 



B. Cases 2b and 2d: £ known 

Consider the case where only the support set £ of e is known prior to recovery. In this case, 
recovery of x (and the non-zero entries of e) from z can be achieved by solving [2H 



minimize x L + eg L 
(P0*,£) { *** ■ (6) 

subject to z = Ax + B^-e^, 



or its convex relaxatiorQ 



minimize x L + eg L 
(BP*,£) <( *« (7) 

subject to z = Ax + Bgeg. 



4 Later we will require l|4j to hold for different values of S. 

5 In order to slightly improve the conditions in (4| or (S}, one could replace the term (n e — 1)/!;, with the cumulative coherence 
as defined in 1251. 

6 Note that since £ is known, the term ||e£ || in (P0*, £) can be omitted. We keep the term, however, for the sake of consistency 
with the problem (BP*,£). 

7 Note that we consider a slightly different convex optimization problem (BP*,£) to that proposed in |2|, (BP, £), for the 
case where £ is known prior to recovery. In practice, however, both problems exhibit similar recovery performance. 
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The following theorems guarantee the recovery of x and e from z, using (P0*,£) or (BP*, £), 
with high probability. 

Theorem 2 (Case 2b): Let x and e be signals satisfying the conditions of .M(PO), assume 
that £ is known prior to recovery and chosen arbitrarily, and assume that X is unknown and 
drawn uniformly at random. Choose (3 ^ log(n x ). If (|4]) holds for some 5 where < 5 < 1 and 
if 

n x fil + n e /j, 2 m < 1 - 5, (8) 

then we can recover x and e using (P0*,£) with probability at least 1 — e _/3 . 

Moreover, if x and e are signals satisfying the conditions of .A/f (BP), and, in addition to (J4]), 
if 

2 2 (l-^) 2 

n * ti ° + n ° tim< 2(log(n a ) + py (9) 

holds, then we can recover x and e using (BP*,£) with probability at least 1 — 3e~^ '. 

Proof: See Appendices [C] and |Dj ■ 

Note that by combining (|4]), ([8]), and possibly (|9]) into a single recovery condition, thereby 

effectively removing 5, we can easily calculate the largest values of n x and n e for which 



successful recovery with high probability is guaranteed (see Section rV-C for a corresponding 
discussion). 

Theorem 3 (Case 2d): Let x and e be signals satisfying the conditions of .M(PO), assume 
that £ is known but X is unknown prior to recovery, and assume that both X and £ are drawn 
uniformly at random. If (|5]) and ([8]) hold for some < 5 < 1 and (3 ^ max { log (r^), log(n e )}, 
then we can recover x and e using (P0*,£) with probability at least 1 — e~^. 

Moreover, if x and e are signals satisfying the conditions of A^(BP) and if ([9]) holds in 
addition to ([5]) and ([8]), then we can recover x and e using (BP*,£) with probability at least 
1 - 3e-*. 

Proof: See Appendices [C] and |Dj ■ 



A discussion of both theorems is relegated to Section IV 



C. Case 2c: X known 

The case where X is random and known, and £ is unknown and arbitrary, differs slightly to the 
case where X is random and unknown, and £ is arbitrary and known (covered by Theorem [2]). 
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Hence, we need to consider both cases separately. The recovery problems (PO*, X) and (BP*, X) 
required here are defined analogously to (PO*, 8) and (BP*,£). 

Theorem 4 (Case 2c): Let x and e be signals satisfying the conditions of .M(PO), assume 
that the support set X is known and chosen uniformly at random, and assume that 8 is unknown 
and arbitrary. If 



$e 3 > ||A|| 22 ||B|| 22 



\ — + I2^ b ^j~f3n e + (n x - l)fi a 

V n b 



2n 

+ t[^ b ^Q]—\\B\\l 2 + ^ m ^/2W x , (10) 

rib 

holds for some < S < 1 and (3 ^ log(n e ), and if 

n x ^ 2 m + n e ^ 2 b < 1-5, (11) 

then we can recover x and e using (PO, X) with probability at least 1 — e _/3 . 

Moreover, if x and e are signals satisfying the conditions of .M(BP), and, in addition to ( 10), 
if 

n ^ + ^ < 2(logK) ) + /3 ) ' (12) 

holds, then we can recover x and e using (BP, X) with probability at least 1 — 3e _/3 . 

Proof: See Appendices [C] and [D] ■ 



A discussion of this theorem is relegated to Section IV 



D. Cases 3b and 3c: No support-set knowledge 

Recovery guarantees for the case of no support- set knowledge, but where one support set is 
chosen at random and the other arbitrarily can be found in p4j Thm. 6]. The theorem shown 
next is able to refine the result in [14, Thm. 6]. The refinements are due to the following facts: i) 



We allow for arbitrary < 5 < 1, whereas 5 = 1/2 in [14 Thm. 6], ii) we add a correction 
term improving the bounds when either A or B are unitary, and iii) we do not use a global 
coherence parameter jjl = max{/i a , /x&, ft m }, but rather we further exploit the individual coherence 



parameters fi a , /x b , and fi m of A and B. See Section IV-A for a corresponding discussion. 

Theorem 5 (Case 3b): Let x and e be signals satisfying the conditions of .M(PO), assume 
that X is chosen uniformly at random, and assume that 8 is arbitrary. If (|4]), ((8]), and ( [TT] ) hold 
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for some < 5 < 1 and j3 ^ log(n x ), then 

(PO*) minimize ||x|| + ||e|| subject to z = Ax + Be, 

x,e 

recovers x and e with probability at least 1 — e _/3 . 

Moreover, if x and e are signals satisfying the conditions of M. (BP) and if ([9]) and ( [12] ) hold 
in addition to Q, ®, and ([n}, then 



(BP*) minimize Hx^ + He^ subject to z = Ax + Be, 

x,e 

recovers x and e with probability at least 1 — 3e _/3 . 

Proof: See Appendices [C] and |D] ■ 

Theorem 6 (Case 3c): Let x and e be signals satisfying the conditions of .M (PO) and assume 
that X and £ are both unknown and chosen uniformly at random. If (|5]), ([8]), and ( fTTj ) hold for 
some < 5 < 1 and (3 ^ max{log(n x ),log(n e )}, then (PO*) recovers x and e with probability 
at least 1 — e~^. 

Moreover, if x and e are signals from .A/f (BP) and if (|9]) and ( fT2~| ) hold in addition to (|5]), ([8]), 



and ( fTT| ), then (BP*) recovers x and e with probability at least 1 — 3e _/3 . 
Proof: See Appendices [C] and [D] 
A discussion of both theorems is given below. 

IV. Discussion of the Recovery Guarantees 



We now discuss the theorems presented in Section III In particular, we study the impact of 
support- set knowledge on the recovery guarantees and characterize the asymptotic behavior of 
the corresponding recovery conditions, i.e., the threshold for which recovery is guaranteed with 
high probability. 

In the ensuing discussion, we consider two scenarios. For the first scenario, we assume that 
A and B are unitary, i.e., n a = n b = m and /i a = A*b = 0, and maximally incoherent, i.e., 
fx m = 1/y/m. For example, A could be the discrete Fourier transform (or Hadamard) matrix 
with appropriately normalized columns and B the identity matrix. The corresponding plots are 
shown in Figure [TJ For the second scenario, A is assumed to be unitary and B is assumed to 
be the concatenation of two unitary matrices so that m = n a = 10 8 , rib = 2n a , ii a = 0, and 
/^> = ^m = 1/y/rn as described in [36|, |37|. The corresponding plots are shown in Figure |2] 
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Fig. 1: A and B are assumed to be unitary with m = n a = n b = 10 and fi m = 1/y/m. In 
(a) the darker curves in the upper-right are for m = 10 8 and the lighter curves in the lower-left 
are for m = 10 4 . In (c) we show the recovery regions only for (BP*). In each case, recovery is 
guaranteed with probability at least 1 — 1CT 8 . 



In each case we set (3 = log(m) or (3 = log(ra)/3 for the ^o-norm and £i-norm-based recovery 
problems, respectively, so that recovery is guaranteed with probability at least 1 — 1/m. 

In order to plot the recovery conditions, we note that for a pair of unitary matrices and a 
given n e , the recovery conditions of the theorems are quadratic equations in ^/n^; this enables 
us to calculate the maximum n x guaranteeing the successful recovery of x and e in closed form. 



A. Recovery guarantees 



1) X and £ known: Figure la shows the recovery conditions for the cases when both support 
sets X and £ are assumed to be known. For small problem dimensions, i.e., m = 10 4 , the 
recovery conditions where both support sets are assumed to be arbitrary turn out to be less 
restrictive than for the case where both support sets are chosen at random. For large problem 
dimensions, i.e., m = 10 8 , we see, however, that the probabilistic results of Theorem [I] guarantee 
the recovery (with high probability) for larger n x and n e than the deterministic results of pj 
considering arbitrary support sets. Hence, the probabilistic recovery conditions presented here 
require a sufficiently large problem size in order to outperform the corresponding deterministic 



results. We furthermore see from Figure la that one can guarantee the recovery of signals having 
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a larger number of non-zero entries if both support sets are chosen at random compared to the 
situation where X is random but 8 is arbitrary. 

2) Only 8 known: Figure lb shows the recovery conditions from Theorems [2] and [3] for the 



cases where only 8 is known prior to recovery (the case of only X known behaves analogously). 
We see that for a random X and random 8 successful recovery at high probability is guaranteed 
for significantly larger n x and n e compared to the case where one or both support sets are assumed 
to be arbitrary. Hence, having more randomness in the support sets leads to less restrictive 
recovery guarantees. We now see that the recovery conditions for (P0*,£) are slightly less 
restrictive than those for (BP*,£). 



3) No support-set knowledge: Finally, Figure lc shows the recovery conditions for (BP*) for 



the case of no support-set knowledge. We see that for random X and 8 , successful recovery is 
guaranteed for significantly larger n x and n e compared to the case where one or both support 
sets are assumed to be arbitrary. As a comparison, we also show the recovery conditions derived 
in JT4l Thm. 6] and the conditions from [|23j, the latter of which does not take into account the 



structure of the problem ([T]). We see that the recovery conditions derived in Theorems [5] and [6] 
are less restrictive, i.e., they guarantee the successful recovery (with high probability) for a larger 
number of nonzero coefficients in both the sparse signal vector x and the sparse interference e. 
4) Non-unitary B: We now consider the setting where B is the concatenation of two uni- 
tary matrices and plot the corresponding recovery threshold for differing levels of support set 
knowledge in Figure [2j For a fixed n x and n a , we see that by increasing n b and /x 6 , we suffer a 
significant loss in the number of non-zero entries of e that we can recover, when compared to 
the case where B is unitary. However, the number of non-zero entries of x that we can guarantee 
to recover is virtually unchanged — an effect which is also present in the deterministic recovery 
conditions (2j. 

B. Impact of support-set knowledge 

As detailed in [2|, having knowledge of the support set of x or e implies that one can guarantee 
the recovery of x and e having up to twice as many non-zero entries (compared to the case of 
no support- set knowledge). 

A similar behavior is also apparent in the probabilistic results presented here. Specifically, for 
unitary and maximally incoherent A and B, the recovery conditions in Figure [3] using ([3]), (PO), 
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Fig. 2: A is assumed to be unitary and B is assumed to be the concatenation of two unitary 
matrices so that m = n a = 10 8 , n b = 2n a , \x a = 0, and fj, b = \x m = l/y/m as described in [36|, 



[37 1. In (c) we show the recovery regions only for (BP*). In each case, recovery is guaranteed 
with probability at least 1 — 1CT 8 . 



and (PO*, £) show a similar factor-of-two gain in the case where both X and £ are chosen at 
random. For example, knowledge of X enables one to recover a pair (x, e) with approximately 
twice as many non-zero entries compared to the case of not knowing X. In Figure [4j we show the 
recovery conditions for the case where one dictionary is unitary, but the other is a concatenation 



of two unitary matrices, as described earlier in Section IV We again see that the extra support- 
set knowledge allows us to guarantee the recovery of a signal with more non-zero entries. It is 
interesting to note that in both of these scenarios, by adding the knowledge of one of the support 
sets, we increase the number of non-zero components we can guarantee to recover in the other 
signal component. For example, by knowing X prior to recovery, we can guarantee to recover 
a signal with more non-zero entries in e. 

We note that a similar gain is apparent for X arbitrary and E random, as well as for using 
(BP) and (BP*,£) instead of (PO) and (P0*,£). 



C. Asymptotic behavior of the recovery conditions 

We now compare the asymptotic behavior of probabilistic and deterministic recovery condi- 
tions, i.e., we study the scaling behavior of n x and n e . To this end, we are interested in the 
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Fig. 3: Impact of support-set knowledge on the recovery conditions for ([3]), (PO), and (PO*, 8) 
in the case where X and 8 are both random. A and B are unitary with m = n a = n b = IO 6 
(lower- left curves) and m = n a = n^ = IO 8 (upper-right curves) and \x m = 1/y/m. 



largest n x for which recovery of x (and e) from z can be guaranteed with high probability. In 
particular, we consider the following models for the sparse interference vector e: i) Constant 
sparsity, i.e., n e = IO 3 , ii) sparsity proportional to the square root of the problem size, i.e., 
n e = y/m, and iii) sparsity proportional to the problem size, i.e., n e = m/10 5 . 

Figure [5] shows the largest n x for which recovery can be guaranteed using (BP*,£). Here, 8 
is assumed to be known and arbitrary and X is unknown and chosen at random. Note that the 
other cases of support- set knowledge and arbitrary /random exhibit the same scaling behavior. 
We see from Figure [5] that for a constant interference sparsity (i.e., n e = IO 3 ), the probabilistic 
and deterministic results show the same scaling behavior. For the cases where n e scales with 
y/m or m, however, the deterministic thresholds developed in [|2| result in worse scaling, while 
the behavior of the probabilistic guarantees derived in this paper remain unaffected. 

We now investigate the scaling behavior observed in Figure [5] analytically. Again, we only 
consider the case where X is unknown and chosen at random and 8 is known and chosen 
arbitrarily; an analysis of the other cases yields similar results. Assume that A and B are 
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Fig. 4: Impact of support-set knowledge on the recovery conditions for ([3]), (PO), and (P0*,£) 
in the case where X and £ are both random. In the top left we assume A is unitary and B 
is the concatenation of two unitary matrices so that m = n a = 10 8 , rib = 2n a , ii a = 0, and 
fJ'b — I 1 ™, = 1/V^ as described in [36 1, [37 1. For the curves in the bottom right (with X 



known/unknown and £ known) we reverse the roles of A and B, so that now B is unitary. 



unitary and maximally incoherent, i.e., \x a — ix^ = 0, n a = n b = m, and ix m = 1/ ' y/m. Then, 
by Theorem [2j the recovery of x from z using (BP*,£) is guaranteed with probability at least 
1 - 3/n a (i.e., for /3 = log(n )) if 

5e~ 1/4 ^ y / n x /n a + 3fi m >y2(3n e , 

and 



2n et S m (\og(n a ) + (3) < (1 - 5) 2 , 
hold. Combining these two conditions gives 



m . 



(13) 



Hence, if n x ~ m and n e ~ m/log(m), the condition (13) can be satisfied. Consequently, 
recovery of x (and of e) is guaranteed with probability at least 1 — 3/m even if n x scales 
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Fig. 5: Maximum signal sparsity n x that ensures recovery of x for £ known and arbitrary. We 
assume n e = 10 3 , n e = \frn, and n e = m/10 5 . The probability of successful recovery is set to 
be at least 1 — 10~ 15 . 



linearly in the number of (corrupted) measurements m and n e scales near-linearly (i.e., with 
m/log(m)) in m. 



We finally note that the recovery guarantees in [ 16 1 also allow for the sparsity of the interfer 



ence vector to scale near-linearly in the number of measurements. The results in [16|, however, 
require the matrix A to be random and B to be orthogonal, whereas the recovery guarantees 
shown here are for arbitrary pairs of dictionaries A and B (characterized by the coherence 
parameters) and for varying degrees of support-set knowledge. 



D. No error component 

It is worth briefly discussing how our results behave when there is no error, that is when 
n e = 0. In this case, the relevant setting is with X unknown and chosen uniformly at random. 
As Theorem [2] holds for any B, it suffices to take B equal to a single column^ since n e = 

8 Taking B to be the zero-matrix and so removing all the terms that appear in the recovery conditions also leads to the same 
scaling behavior. 
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means we do not consider any component of B when attempting to recover the signals. And 
since the mutual coherence \i m only appears as a product with n e , it does not matter what we 
assume \x m to be. Thus by taking n e = and applying Theorem [2] we find that for (P0*, £), 
recovery is guaranteed with probability at least 1 — e _/3 if 



e"*(l - n x fi 2 a ) > ||A|| 2 J— + 12fjL a y/p^. (14) 

V ^a 

For (BP*,£), recovery is guaranteed with probability at least 1 — 3e _/3 if 

e-i (l - y/2n x nl{\og{n a ) + f3)) > \\A\\ 2 J^- + I2fi a y/^. (15) 

Now assume that \x a ~ l/y/m, || A|| 2 = n a /m, and that /3 = log(n ). Then (after ignoring lower 



order terms), we find that { fT4[ ) and ( |T5| ) imply recovery with probability at least 1 — l/n a and 



1 — 3/n a , respectively, provided that 

m ^ Cna.log(n a ), 



for some positive constant C. This result is in accordance with [23 1, the RIP-based proof of 



[38 1 which requires m ^ Co n x log (n a /n x ) to guarantee recovery with high probability, and 
the random sub-sampling model of [27 1, which, for a maximally incoherent sparsity basis and 
measurement matrix^ requires m ^ Ci n x log(n ) to guarantee recovery with high probability. 
Thus, our results reduce to some of the existing results in the setting where there is no error. 

V. Conclusions 

In this paper, we have presented novel coherence-based recovery guarantees for sparsely 
corrupted signals in the probabilistic setting. In particular, we have studied the case where the 
sparse signal and/or sparse interference vectors are modeled as random and the dictionaries 
A and B are solely characterized by their coherence parameters. Our recovery guarantees 
complete all missing cases of support-set knowledge and improve and refine the results in pi, 
[ [T4| . Furthermore, we have shown that the reconstruction of sparse signals is guaranteed with 
high probability, even if the number of non-zero entries in both the sparse signal and sparse 
interference are allowed to scale (near) linearly with the number of (corrupted) measurements. 

9 For example, measuring with a randomly sub-sampled Fourier matrix and taking the Identity matrix as the sparsity basis, so 
that with the differently normalized definition of coherence as in |27|, fi a = 1. 
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There are many avenues for follow-on work. The derivation of probabilistic recovery guaran- 
tees for the more general setting studied in p"3j , i.e., z = Ax + Be + n with n being additive 
noise and x and e being approximately sparse (rather than perfectly sparse), is left for future 
work. In addition, our framework could be generalized to the setting where we split both the 
known and the unknown support sets into a random and arbitrary part, resulting in four parts, 



as outlined in Section I-A2 Finally, the derivation of probabilistic uncertainty relations for pairs 
of general dictionaries is an interesting open problem and would complete the deterministic 
uncertainty relations in |2), p"4~| [. 
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Appendix A 
Bounds on a miB (T> x ^ £ ) 

We now derive probabilistic bounds on cr min (D^£-), which are key in showing when the 
recovery from sparsely corrupted signals succeeds. We extend p4| Lemma 7] to the case where 
both supports X and E are chosen at random and give improved results for the case where only 



one support set is random. First, we require the following two results from [23|. 



Theorem 7 (Thm. 8 of $23$): Let M G C mxn be a matrix. Let S C {1, 2, . . . , n} be a set of 
size s drawn uniformly at random. Fix q ^ 1, then for each p ^ max{2, 2 log(rank(MR lS )), q/2} 
we have 



E" 



I MR. 



S II 2,2 



<3v^||M|| 12 



I Ml 



2,2 ' 



where ((M^ 2 = sup veC n ||Mv|| 2 / || v|] a and is the maximum £ 2 - n orm of the columns of M. 

Lemma 8 (Eq. 6.1 of [23]): Let M G C mxn be a matrix with coherence n and let S C 
{1, 2, . . . , n} be a set of size s chosen uniformly at random. Then, for (3 ^ log(s) and q = 4/3 



E" 



\H 



M£M 5 -I 



2,2 



^ Yl\iJJ~s + \\u, ^ 01— ||M" 2 

n 



2,2 • 
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Note that the result in [23, Eq. 6.1] does not include the indicator function l[/x^0]. It 
is, however, straightforward to verify that if M is orthonormal, then /j, — and hence, 
||Mf M s - 1|| 22 = for all sets S. 

We now state the main result for cr mm (Dx,e)- 

Theorem 9: Choose f3 ^ log(n x ), q = 4/3 and assume that A and B are characterized by the 
coherence parameters /i a , \x\>, and yu m . If i) X is chosen uniformly at random with cardinality n x , 
£ is arbitrary, and (|4]) holds, or ii) £ is chosen uniformly at random with cardinality n e , X is 



arbitrary, and p0| ) holds, or iii) both X and £ are chosen uniformly at random with cardinalities 
n x and n e respectively, and ([5]) holds, then 

P{||D^D^-I|| 2)2 >5}<e-^ (16) 

and if @, © or ((TOj) hold with 5=1, then 

P{<r min (D;^) = 0} < e~P. 



(17) 



Proof: The proof follows that of [14 Lemma 7]. We start by defining the hollow Gram 
matrix 



H = Df ,B x , e - 1 



A£A* - I A^B; 



B?A; 



Bf B £ - 1 



^x D £ 
Splitting H into diagonal and off-diagonal blocks and applying the triangle inequality leads to 

H 



l H ll 2 ,2 < 



A£A*-I 







B"B, -I 



< max<! 1 1 Ay A 



^ llA^A 



x^x 

x-t^x -i r 



'£ 



I 



+ 

2/2 



A^B £ 



BfA^ 







2,2 



B F B £ 



+ llBf B £ 



I 



4^} + ll B " A 

|B £ A*|| . 



A' 



2.2 



2,2 



12,2 ' II c "t 112,2 

Since the gth moment effectively defines an £ g -norm, it satisfies the triangle inequality, namely, 
E q \\X + Y\] < E q \\X\] + E«[|y|l. Hence, it follows that 



E q 



IHI 



2.2 



< E q 



\A%A X 



I 



2.2 



E q 



| B £ B £ 



I 



2.2 



+ E g 



B?A 



a 



2.2 



(18) 



We now separately bound each of the terms in ( [18] ) and we do this for each case where X and £ 
is either chosen at random or arbitrarily. If X is chosen uniformly at random, then it follows 
from Lemma [8] that 



E" 



AfA*-l| 



2.2 



2n, 



< I2fi a y/fi^+l\fi a ± 0] — || A 

n„ 



2,2 ' 



(19) 
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for any 4/3 = q ^ A\og(n x ). If X is allowed to be arbitrary, then for all X we have 
||Af K x -l|| 22 ^ max^lfAfA^I < (n x - J> a , 



(20) 



where the first inequality follows from the Gersgorin disc theorem [39, Thm. 6.1.1] and the 
second inequality is a consequence of the definition of /i a . By reversing the role of A and B, 
we get the analogous bounds for the right-hand side (RHS) term E 9 



| B F B £ 



I 



2.2 



in (18). 



For the third summand appearing in the RHS of ( 18), let us first consider the case where 8 is 



chosen arbitrarily and X uniformly at random. We then want to apply Theorem [7] to M = B^A 
and R^. Since MR^ has n e rows and n x non-zero columns, rank(MR^) ^ mm{n x ,n e } 
and thus we can apply Theorem [7] with q = 2p = A/3 where q ^ 4min{log(n x ), log(n e )} ^ 
41og(rank(MR A >)) to get 



E" 



Ir^a II 



K 



\-R H A II 

\ B £ A *ll 2 ,2 



(21a) 



^3^\\B?A\\ h2 + 
^ 3fi m ^/2/3n e + 




lB?All 
1^-^-112,2 




B*A| 



2,2 



(21b) 



where the entries of B^A are bounded by the mutual coherence fx m . The case where 8 is 
random and X is arbitrary follows by reversing the roles of A and B. 

Now consider the case where both 8 and X are random. We can set M = B^A so that 



we may write K q 



Irc-^A II 

\ B £ A *ll2,2 



E| 



K 



R f MR 



A* 1 1 2,2 



in order to apply Theorem 7 to first 



0' 



bound the inner expectation, and then to bound the resulting outer expectation. However, this 



approach results in a worse bound compared to reusing (21b), which does not depend on 8 and 



hence holds for all 8. By also taking the expectation in ( |21a| ) with respect to 8 instead of X 
and bounding similarly, we get that 



E" 



Irc^A II 

\ B £ A *ll 2 ,2 



^ min < 3n m ^2(3n x + 
3fi m >y2(3n e + 




rib 

Tin 



|A H B| 



2,2 ' 




!a"b 



I 2.2 I ' 



(22) 



for any j3 ^ min{log(n ;c ),log(n e )}. Combining ( fT9] ), ( [20] ), ( |21b[ ), and ( [22] ) with the analogous 
results for B and 8 leads to the conditions (|4]), ([5]), and ( flO] ). 
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Due to ( p~9| ) and the analogous result for B^, if X is chosen at random, we require (3 ^ log(n x ), 
if £ is chosen at random we need (3 ^ log(n e ), and if both X and £ are chosen at random, both 
of these conditions need to be satisfied, namely that (3 ^ max{log(na;) , log(n e )}. 



We now show that the conditions ([4]), ([5]), and ( |T0| ) are sufficient to show that ( fT6] ) holds 



Chebyshev's Inequality [40, Sec. 1.3] states that for a random variable X and a function 

HX e A} < . f[ f { [f } X)] .. . (23) 

mf {/(x) : x e .4} 



Application of ( [23] ) with /(x) = x 9 and the random variable X = D^D 5 — I gives 



2.2 



mijx 9 : x ^ o} o 9 



provided that (<5e 1 / 4 ) 9 ^ E[X 9 ]. But this is guaranteed by the assumptions in Q, Q, or ( fT0| ), 
depending on the signal and interference model. Therefore, we have 



P{||H|| 2>2 ^5)^e-' 3 , 

since q = 4/3. The second part of the theorem, ( [T7] ), is a result of the fact that <J m - m (Dx,s) — 
implies that ||H|| 22 ^ 1 and hence, F{a min (D X! s) = 0} < P{||H|| 2i2 > lj. ■ 

Appendix B 
Both Supports Known 

Proof of Theorem U\ It suffices to show that T>x,e is invertible, which is equivalent to the 
condition that cr m i n (Dx,s) > 0. By assumption, the conditions of Theorem [9] hold, which implies 
P{cr m i n (D^ ) g) = 0} ^ e _/3 . Hence, recovery of x and e using ([3]) succeeds with probability at 
least 1 - e _/3 . ■ 

Appendix C 

(P0) with Limited Support Knowledge 

We now prove the recovery guarantees for (P0*), (P0*, £), and (P0, X) for partial (or no) 
support-set knowledge of E and X. We follow the proof of p3| and present the three cases 
1) X known, 2) £ known, and 3) no support-set knowledge, all together, since the corresponding 
proofs are similar. Note that 1Z(D) denotes the space spanned by the columns of D. 
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We begin by generalizing |23l Thm. 13] to the case of pairs of dictionaries A and B where 



we know the support set of e. The result gives us a sufficient condition for when there is a 
unique minimizer of (PO*), (P0*,£), or (P0,#). 

Lemma 10 (Based on Thm. 13 of $23$): Let A G C mXTla and B G C mxrib be two dictionaries 
and suppose that we observe the signal z = Ax + Be where X = supp(x) and £ = supp(e) and 
the non-zero entries of x and e are drawn from a continuous distribution. Furthermore, suppose 
that £ is known. Write D = [A B] and f) x£ = [A x B £ ], If 



dim(n(i> x A mzCbx'A) < \x\ + \£\ 



(25) 



for all sets X' ^ X where \X\ = \X'\, then, almost surely, (P0*,£) recovers the vectors x and e. 
This result also provides a sufficient condition for (PO*), if we set A = D and take B to be the 
empty matrix, or for (PO, X), if we set A = B and B = A. 



Proof: We follow the proof of [23 Thm. 13]. We begin by defining the set of all alternative 
representations as follows: 



U X,X' 



Ax + Be = Ax' + Be' 

(x, e) : supp(x) = X, supp(x') = X' 
supp(e) = supp(e') = £ 

and the set of observations that have alternative representations 



A £ 



X,X' 



z: z = A^x^ + B,^, (x, e) G Z>f x , , 



so that A% x 1 i s me set °f observations that can be written in terms of two pairs of signals (x, e) 
and (x',e') where X = supp(x), X' = supp(x'), and £ = supp(e) = supp(e'). 
For any X' of size \X\ and X' ^ X, we have 

A e x<x , c n{p x , s ) n n(p x , ie ) . 

Now assume that ( f25| ) holds for X, X', and £ , then dim(^4|. x ,) < \X\ + \£\. Thus the smallest 
subspace containing A x x , is a strict subspace of lZ(D x> s) and hence, has zero measure with 
respect to any nonatomic measure defined in the range of I) Xt £. Since x and e, and hence z, 
have non-zero entries drawn from a continuous distribution 



F\ Ax + Be = z G A 



e 

X,X' 



0. 
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Thus, with probability zero, there exists no alternative pair (x', e') with supports X' and 8, 
respectively, otherwise z would lie in A% x ,. Therefore, if ( |25| ) holds for all X', then the 
probability of choosing random x and e so that z admits an alternative representation is zero, 
and hence, almost surely, given z = Ax + Be, (PO*, £) returns the vectors x and e. ■ 

We can use Lemma [10] to prove the first part of Theorems [2] |3} |4} |5} and [6] by showing 



that ( 25 ) holds with high probability. To show that ( [25] ) holds for all X' we show that for every 
column a 7 of A not in K x (i.e., for all 7 ^ X) that a 7 ^ TZ(D Xt£ ), which is equivalent to 
showing that 



P,%\£ a 7 



\ I I £iy I I o 1 , 



(26) 



»t \Ht\H 



for all 7 ^ X and where Px,£ — (P 1 x s) *-*x E * s tne projection onto the range space of ~Dx,e- 



We will now bound the probability that ( [26] ) holds for the following three situations: 1) only S 
known, 2) only X known, and 3) both support sets unknown. 

1) Only E known: Consider the setting where 8 is known, but X is unknown; this case fits 



the setting of Lemma 10 with A = A and B = B. Hence, the condition ( [26] ) is equivalent to 



|P;t\,fa 7 || 2 < || a,, || 2 = 1. We have 



|P;t,£a 7 || 2 ^ 



'Px,e) H 



2.2 



\T) H a I 



^ m in(DWl|A£a 7 '' 2 



A 7ll2 



iB^a II 

\ D £ d 7ll2' 



From the definitions of the coherence parameters^] 



\Vx£ a j\\o < & = Vvl n x + V™ 



(27) 



Thus, in order to guarantee ||P;t,£a 7 || 9 < 1 it suffices to have 



£f < CTmin(Dx,e)- 



(28) 



Note that we use bounds that hold for all X, rather than a bound that holds with high probability. The underlying reason 
is the fact that if A is an equiangular tight frame, the associated inequalities hold with equality and hence, we cannot do any 
better by using probabilistic bounds, unless we take advantage of a property of A other than the coherence /x a ■ 
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2) Only X known: For the setting where only X is known, we apply Lemma 



10 



with A = B 



and B = A, thus the condition of ([25]) becomes dim(72.(D^,£:) fl lZ(Dx,£')) < \X\ + \£\, and so 
we only want to show that ||P;t?,£b 7 |L < ||b 7 || 2 for all 7 ^ S. Proceeding as before, it follows 
that 

||P^b 7 || 2 ^a m i n (D^)||D^b 7 || 2 

<^(D^)e*, (29) 

where £ x = y ' [i 2 m n x + [i\n e . Hence, it suffices to show that 

ix < a min (B X)£ ). (30) 

3) No support-set knowledge: Finally, we consider the setting where neither X nor £ is 



known, so we apply Lemma 10 with A = [A B] and B being the empty matrix, thus this 
is exactly the condition of [23, Thm. 13]. Then, we show that ||P^£:d 7 || 2 < ||d 7 || 2 for any 



column d 7 of D not in T) x ,e- In other words, we want both {(27J) and ( |29[ ) to hold as d 7 can be 
a column of either A or B. So it suffices to show 

l|P^d 7 || 2 <o-J a (D A? , £ )e + <l, (31) 

where £+ = max{£*,&}. 

Finally, to show that the (P0) based problems succeed, we want to bound the probability that 



( [28] ), ( [30] ), or pi} holds (depending on which, if any, support sets we know). In each of the 
cases, we know that (P0*), (P0*,£), or (P0, X) returns the correct solution if £ < o- min (Dx,e), 
where £ G (0, 1) is equal to £g, £#, or £+ (as appropriate to the case). Hence, we can bound the 
probability of error as follows 

Pjerror} < P{£ > a min (D^, f )} 

< P{ ||D^D^ - I|| 2i2 > 1 - i 2 } < e-0, 

where we use Theorem [9] with 5 = 1 — £ 2 . Therefore, with probability exceeding 1 — e _/3 , the 
pair (x, e) is the unique minimizer. 
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Appendix D 
(BP) with Limited Support Knowledge 

We now prove the recovery results for the (BP) based algorithms. To do this, we restate the 
sufficient recovery condition of [41] and then show when we can satisfy this condition, thereby 
guaranteeing the successful recovery of x with (BP*, E), (BP, X), or (BP*). 

Theorem 11 (Thm. 5 of pTjj): Suppose that the sparsest representation of a complex vector 
z is D^s^. If T>x s is full rank and there exists a vector h G C m such that 



D 5 h = sign(s 5 ), and 

|(h, d 7 )| < 1 for all columns d 7 of D not in D^, 



(32a) 
(32b) 



then s is the unique minimizer of (BP). 



We can easily apply Theorem 11 to attain recovery conditions for (BP*, £), (BP, X), and 



(BP*). For (BP*,£), we apply Theorem [ll] to the matrix D = [AB £ ] so that the two 
problems (BP) and (BP*, E) are the same. We want to show that s^ = [x x ej] is the sparsest 



representation of the observation z. By rewriting ( |32a| ) and ( |32b[ ) it follows that it is sufficient 
to guarantee recovery with (BP*,£) if there exists a vector h G C m such that 



'A x B £ ] H h = sign 



e £ 



, and 



(h, a 7 )| < 1 for all columns a 7 of A not in A x . 



(33a) 
(33b) 



Similarly, to get a recovery condition for (BP*), we merely apply Theorem 11 to the matrix 
D= [AB], 

Finally, before we can prove the probabilistic recovery guarantees for the i\ -norm-based 
algorithms of Theorems [2[ |3} |4} [5} and |6} we require the following lemma. 



Lemma 12 (Bernstein's Inequality, Prop. 16 of [23]): Let v G C n and let e G C n be a 
Steinhaus sequence. Then, for ti^Owe have 

,2- 



P< 



E 



8=1 



SjVi 



^ u Il v ll2 r ^ 2exp 



u 
"2 



(34) 



A Steinhaus sequence is a (countable) collection of independent complex-valued random 
variables, whose entries are uniformly distributed on the unit circle p3|. 
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We now prove the second part of Theorems [2j [3j |4j [5J and [6} To show that recovery with 



(BP*), (BP*,£), or (BP, X) succeeds, we demonstrate that the vector h, as in Theorem 11 



exists with high probability. We now consider the following three settings in turn: 1) only 8 
known, 2) only X known, and 3) both support sets unknown. But first, let us assume that in 
each case T) x ,e is full rank. 

1) Only 8 known: Consider the case where 8 is known but X is unknown, we show that 
a vector h exists that satisfies ( |33a| ) and ( |33b[ ) with high probability. To this end, set h = 
~Dx,s (Px e^x,e) sign(s^ f ), so that <\33sl) is satisfied. Then, for any column a 7 of A where 



|(h,a 7 )|- (l>x,s iPx fiDxfi) 1 sign(s^ i£ ),a, 
sign(s* >£ ), (D^D^^Df^ 






with e = sign(s^£-) and v 7 = (T)^ £ Dx,e) D^ f a 7 . Since e is a Steinhaus sequence (by 
assumption), we can apply Lemma 



12 



with u = || v 7 1| 2 to arrive at 



P< 



J2^ v l 



3=1 



^ 1 ^ < 2 exp I 



2||v 7 l 



But we have that 



(35) 



(D£ £ D* j£ ) D^a 



< 



(D^D^,,) 



-i 



2,2 



D^a 7 ||^^f n (D^)^, 



where £| = n x nl + n e \j? m . Hence, ( [35] ) results in 



P< 



J2 e ^ 



i=i 



> 1 > < 2exp 



°"min( D A',£:^ 



2£i 



Now we want ( |32b| ) to hold for all 7 ^ X. Hence, applying the union bound to the result above 
leads to 



P< max 






7 

F -V ■ 



^ 1 > ^ 2n a exp 



°"min( D *A 



2€I 



(36) 



March 1, 2013 



DRAFT 



30 



2) Only X known: Consider the setting where X is known, but 8 is unknown. This setting 
follows exactly as in the setting where 8 is known and X is unknown by switching the roles of 
X and 8. Thus, we arrive at 

at JT>, 



P< max 



J2 e J i 



i=i 



^ 1 > ^ 2n&exp 



'x,s, 



2& 



(37) 



where £| = n x [i 2 m + n e [i\ and v 7 = D^b 7 . 

3) No support-set knowledge: Finally, we consider the third setting where neither X nor 8 



are known. In particular, we want to show that in Theorem 11 we can satisfy ( |32a| ) and ( |32b| ) 
with high probability. For any column d 7 of D not in ~Dx,s, set v 7 = Y> x £ d 1 . In this case, we 
have 

2 



< 



{T>% t£ B X ,s) 



-i 



2.2 



Px&Wl 



^<tZLCD xj; )& 



where £+ = max{n x /i^ + n e/ u^, n x [i 2 m + n e /i^} and hence, 



P< 



".i:+n e 



s £ ^ 7 



^ 1 > < 2 exp 



cri ; „(D 



#,£; 



2^ 2 



Finally, we want (j32bj) to hold for all d 7 . Therefore, applying the union bound to the result 
above leads to 

n x +n e 

)lk 2(n a + n fe )exp( _ "^-^,w j _ (3g) 



P< max 



E 

3=1 



7 

£ ^i 



2e 



We now want to derive an upper bound on the right hand sides of ( |36| ), ( [37] ), and ( |38| ). First 
we calculate the probability conditioned on <y mm (Dx,£) > A G (0,1). Note that if A > 0, then 



cr min (D A ' ) £-) > A > and we satisfy the remaining assumption of Theorem 11 namely that ~Dx,£ 
is full rank. 

For convenience, in the case where 8 is known, let us set N = n a and £ = £g. In the case 
where A^ is known, set N = n& an d £ = £x an d finally, in the case where neither X nor 8 are 
known, set N = n a + n b and £ = £ + . 

Thus, we have 



P< max 

[ lis 



N 

E 



7 



^ 1 



<WD^) >\}^2Nexp[-^-) <:2e~ 



(39) 
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for some (3 < A 4 /(2£ 2 ) - logiV. 



For our particular choice of h, ( |33a[ ) (in the case where X or E is known) or ([32a]) (in the case 
where both supports are unknown) will always be satisfied. So let <£ be the event that ( |33b| ) (in 
the case where one support is unknown) or (j32b]) (in the case where both supports are known) is 



not fulfilled with our choice of h and let £H be the event that T)x,s is not full rank. As <£U9l is a 
necessary condition for the (BP) based algorithms not to be able to recover the vectors x and e, 
P{<£ U y{} is an upper bound on the probability of error. Then, since a min (Dx,s) > A > implies 
that £H cannot occur, and hence that P{<£ U V\\a min (D X)£ ) > A} = F{<S\a min (D X:£ ) > A}, we 
have that for any A > 

P{£ USH} = P{(£U ^\a min (B Xj£ ) > A} F{a min {Vx,s) > A} 
+ P{£ U 9%|a min (D^ ) ^ A} P{a min (D^) < A} 
< P{(£|<x min (D;^ ) > A} + P{a min (D^) < A} . (40) 



We can bound the first summand in (|40|) using ([39]) under the assumption that f3 ^ A /(2£ ) — 



log N. The second term we can bound using Theorem [9] with 5 = 1 — A 2 G (0, 1), which, 
provided that f3 ^ N' where N' is the size of the supports chosen at random, says that 
P{c m in(D^,f) ^ A} ^ e _/3 . Therefore, we have 

P{£U9%}^3e- /3 , (41) 

and hence, we can recover x and e with probability at least 1 — 3e _/3 . 
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